NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Ba, Jimmy; Erdogdu, Murat A; Suzuki, Taiji; Wang, Zhichao; Wu, Denny (December 2023, Advances in Neural Information Processing Systems 36 (NeurIPS 2023))
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
We consider the problem of learning a single-index target function f∗ : Rd → R under the spiked covariance data: f∗(x) = σ∗ √ 1 1+θ ⟨x,μ⟩ , x ∼ N(0, Id + θμμ⊤), θ ≍ dβ for β ∈ [0, 1), where the link function σ∗ : R → R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of σ∗), and it depends on the projection of input x onto the spike (signal) direction μ ∈ Rd. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d → ∞, n/d → ψ ∈ (0,∞), we ask the following question: how large should the spike magnitude θ be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f∗? We show that for kernel ridge regression, β ≥ 1 − 1 p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, β > 1 − 1 k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k ≤ p by definition, neural networks can adapt to such structures more effectively.
more » « less
Full Text Available
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

Lu, Yu; Balasubramanian, Krishnakumar; Volgushev, Stanislav; Erdogdu, Murat A. (December 2021, Advances in neural information processing systems)

Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning. Algorithmic convergence and statistical estimation rates are well-understood for such problems. However, quantifying the uncertainty associated with the underlying training algorithm is not well-studied in the non-convex setting. In order to address this short-coming, in this work, we establish an asymptotic normality result for the constant step size stochastic gradient descent (SGD) algorithm—a widely used algorithm in practice. Specifically, based on the relationship between SGD and Markov Chains [1], we show that the average of SGD iterates is asymptotically normally distributed around the expected value of their unique invariant distribution, as long as the non-convex and non-smooth objective function satisfies a dissipativity property. We also characterize the bias between this expected value and the critical points of the objective function under various local regularity conditions. Together, the above two results could be leveraged to construct confidence intervals for non-convex problems that are trained using the SGD algorithm.
more » « less
Full Text Available
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms

Camuto Alexander; Deligiannidis, George; Erdogdu, Murat A.; Gurbuzbalaban Mert; Simsekli, Umut; Zhu, Lingjiong (January 2021, Advances in neural information processing systems)

Full Text Available

Search for: All records